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Abstract 

Estimation of parameter sensitivities for stochastic chemical reaction networks is an important and 
challenging problem. Sensitivity values are important in the analysis, modeling and design of chemical 
networks. They help in understanding the robustness properties of the system and also in identifying the 
key reactions for a given outcome. In a discrete setting, most of the methods that exist in the literature 
for the estimation of parameter sensitivities rely on Monte Carlo simulations along with finite difference 
computations. However these methods introduce a bias in the sensitivity estimate and in most cases the 
size or direction of the bias remains unknown, potentially damaging the accuracy of the analysis. In this 
paper, we use the random time change representation of Kurtz to derive an exact formula for parameter 
sensitivity. This formula allows us to construct an unbiased estimator for parameter sensitivity, which 
can be efficiently evaluated using a suitably devised Monte Carlo scheme. The existing literature contains 
only one method to produce such an unbiased estimator. This method was proposed by Plyasunov and 
Arkin and it is based on the Girsanov measure transformation. By taking a couple of examples we 
compare our method to this existing method. Our results indicate that our method can be much faster 
than the existing method while computing sensitivity with respect to a reaction rate constant which is 
small in magnitude. This rate constant could correspond to a reaction which is slow in the reference 
time-scale of the system. Since many biological systems have such slow reactions, our method can be a 
useful tool for sensitivity analysis. 

Keywords: parameter sensitivity, random time change, Gillespie, Markov process, chemical reaction net- 
work, Girsanov, coupling. 



1 Introduction 



Stochastic models for chemical reactions networks have become increasingly popular in the past few years. 
Their appeal comes from the fact that molecules in a chemical system always display randomness in their 
dynamics. This randomness cannot be ignored when the molecules are present in low numbers, as it can 
have a significant effect on the overall properties of the dynamics. Therefore one needs to consider stochastic 
models to account for this randomness. Recently many problems in Biology [17, 16, 6] and Chemistry [13] 
have been studied using such models. Typically, a chemical reaction network depends on various kinetic 
parameters whose values may be uncertain or difficult to measure with high precision. In such cases, one 
would like to determine how sensitive a given output of the system is to small changes in the parameter 
values. If an outcome is highly sensitive, then greater time and effort may be invested in determining 
that parameter accurately. Sensitivity analysis can also be used in fine-tuning a certain output (see [8]) or 
understanding the robustness properties of a biological system (see [23]). 

Consider a system consisting of d chemical species. We assume that the system is well-stirred and hence 
its state at any time can be described by a vector in Nq whose i-th component is the non-negative integer 
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corresponding to the number of molecules of the i-th species. These chemical species interact through K 
predefined reaction channels and every time the j-t\i reaction fires, the state of the system is displaced by 
the d-dimensional stoichiometric vector Q € 7L d . Given that the state of the system is x, the rate at which 
the j-th reaction fires is given by the propensity function Xj (x) . In a stochastic setting, such a chemical 
reaction network can be modeled by a continuous time Markov process over Nq. See [2] for a survey of such 
models. It can be shown that the probability mass function of this Markov chain evolves according to the 
chemical master equation (CME). Solving the CME analytically or even numerically is usually quite difficult 
except for very simple systems (see [18]). However it is easy to generate the sample paths of this process 
using Monte Carlo simulations (see [10, 9]). 

Now suppose that the propensity functions of the reaction network depend on a scalar parameter 9. 
We denote these propensity functions by Xj(x,9) for j = 1,...,K and the resulting Markov process by 
{Xg(t) : t > 0}. Given a function / : Nq — > K and an observation time T > 0, our goal is to compute the 
quantity 



Here f(Xg(T)) is the output of interest and Sg(f, T) evaluates how the expected value of this output changes 
with infinitesimal changes in the parameter 9. Since simulating the paths of this Markov process is easy, 
many methods use a finite difference scheme such as 



for a small h to estimate Sg(f,T). The processes Xg and Xg+h can either be simulated independently (see 
[11]) or they can be coupled in an intelligent way to reduce the variance of the associated estimator (see 
[20, 1]). Finite difference schemes are easy to implement but they have a few drawbacks. They introduce 
a bias in the sensitivity value and one cannot recover the exact value Sg (/, T) , even by taking infinitely 
many samples. In most cases, the magnitude and sign of this bias is unknown which may cause problems 
in certain applications. One can reduce the size of the bias by picking a small h. However for small values 
of h, the variance of the sensitivity estimator can blow up (see the discussion in [20]), making it necessary 
to generate extremely large number of samples to obtain the sensitivity estimate within a tight confidence 
interval. Hence there is a clear trade-off between the bias and the computational cost which implies that such 
schemes are efficient as long as one is willing to tolerate some amount of (unknown) bias. Another source of 
difficulty with finite difference schemes is that to estimate the sensitivities with respect to several parameters, 
one needs to estimate it with respect to each parameter separately. This can be quite cumbersome for large 
networks. One can avoid this problem by using another approach which estimates the sensitivity value using 
pathwisc differentiation (see [22]). In this method, the problem of estimating the sensitivity is regularized in 
the sense that 



is computed as a proxy for Sg (/, T) . Here w is the half-width of the regularizing window. With this 
scheme, the sensitivities with respect to multiple parameters can be estimated simultaneously. However it 
also produces a biased estimate and its performance depends crucially on the parameter w which has to be 
determined carefully to achieve the desired level of accuracy. 

If one is interested in finding an unbiased estimate for the parameter sensitivity, then for this purpose 
there is only one approach in the existing literature. This approach was proposed by Plyasunov and Arkin 
in [19] and it relies on the Girsanov measure transformation. Henceforth we shall call this method as the 
Girsanov method. It has the advantage of being computationally easy to implement and unlike the other 
methods mentioned above, one does not need to tune other parameters (like h or w) to achieve the required 
statistical accuracy. Moreover the sensitivities with respect to several parameters can be estimated together 
in a single run. Since the method is unbiased we can get better and better estimates simply by taking larger 
and larger samples. However in many situations the estimator produced by this method has a large variance. 
One such situation that commonly arises is when the sensitive parameter 9 is a reaction rate constant which 
is small in size (see Example 2.4). In this case, the variance of the sensitivity estimator is large, making it 
difficult to estimate the sensitivity efficiently. We shall discuss this issue in greater detail in the next section. 



Sg(f,T) = -E(f(Xg(T))). 



(1.1) 



Sg. h (f, T) = — E (f(Xg +h (T)) - f(Xg(T))) , 



(1.2) 
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The above discussion shows that it would be desirable to have a new method, which provides an unbi- 
ased estimate for the parameter sensitivity and also performs better than the Girsanov method in certain 
situations. The goal of this paper is to present such a method. Using the random time change representation 
of Kurtz (see Chapter 7 in [7]) we first derive an explicit expression for Sg(f,T) (defined by (1.1)). This 
expression is derived by exploiting the coupling that was introduced in [1] to construct an efficient estimator 
for Sg,h{f,T) (see (1.2)). Based on this expression for Sg(f,T) we construct an unbiased estimator for the 
parameter sensitivity. Our method has two main advantages over the Girsanov method. It requires much 
fewer number of samples to produce the desired estimate and its performance does not deteriorate as the 
magnitude of the sensitive parameter gets smaller. In fact, our method can also be used to compute the 
sensitivity with respect to a rate constant that is set to 0. This can tell us how sensitive a given output 
is to the presence or absence of a certain reaction channel and this information can help us weed out the 
redundant reactions in a reaction network. Such a computation is not possible with the Girsanov method. 
Our method also has its disadvantages in comparison to the Girsanov method. It is harder to implement, 
requires more memory to run and the computational cost of generating each sample from the required dis- 
tribution is high. Our results indicate that our method can be considerably more efficient than the Girsanov 
method, while computing the sensitivity with respect to a reaction rate constant which is small in size. This 
would correspond to the reaction which is slow in the reference time-scale of the system (determined by the 
observation time period [0, T]). As mentioned in the preceding paragraph, this is one of the situations where 
the Girsanov method performs poorly. Since many biochemical networks have wide variations in the rate 
constants of the constituent reactions, our method could be a useful tool for sensitivity analysis. 

In our opinion, the main contribution of this paper is to present a new formula for the parameter 
sensitivity. This formula expresses the sensitivity as an expectation of a certain random variable which 
depends on the path of the underlying Markov process. However we shall see in the next section, that 
evaluating this random variable requires us to compute several expectations of functions of our Markov 
process at various times and various initial states. In principle we can compute such expectations as and 
when they are needed by simulating many new paths of the Markov process, but this approach can render the 
problem computationally intractable for most systems of interest. To circumvent this issue we devise a Monte 
Carlo scheme in which all the required expectations are estimated using a fixed number of auxiliary paths. 
This approach is outlined in Section 3. A more detailed description along with full implementation details is 
given in [12]. Finally we would like to mention that our sensitivity formula can be used for any continuous 
time Markov process over a discrete lattice. Other than stochastic reaction networks, such processes arise 
naturally in queuing theory and population modeling. 

This paper is organized as follows. In Section 2 we present our main result which gives a formula for the 
parameter sensitivity. Through a couple of examples we motivate why the estimator based on our formula 
can perform better than the Girsanov method in certain situations. In Section 3, we formally describe our 
method for estimating the parameter sensitivity and compare its performance with the Girsanov method. 
We also remark how our method fits into the grand scheme of sensitivity analysis for stochastic reaction 
networks. In Section 4 we prove the result mentioned in Section 2. Finally, in Section 5 we conclude and 
present directions for future research. 



Recall the description of the chemical reaction network from the previous section. We suppose that the 
propensity functions depend on a scalar parameter and hence denote these functions by Xj(x,8) for j = 
1, . . . , K , In a stochastic setting, the dynamics can be modeled by a continuous time Markov process whose 
generator 1 is given by 



1 The generator of a Markov process is an operator which specifies the rate of change of the distribution of the process. See 
Chapter 4 in [7] for more details. 



2 The Main Result 



K 




(2.3) 
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where / is a bounded function from Nq to R. Let {Xg(t) : t > 0} be a Markov process with this generator. 
The random time change representation of Kurtz (see Chapter 7 in [7]) allows us to write 

X e (t) = Xe{0) + J2Y k X k (X e (s),6)ds S j Ck, (2.4) 

where {Y k : k = 1, . . . , if} is a family of independent unit rate Poisson processes. We define the function A 
as the sum of propensities 

K 

X (x,6)=J2Mx,0). (2.5) 

i=i 

Throughout the paper || • || will refer to the standard 1-norm on M. n given by ||x|| = Y^i=i \ x i\- Moreover the 
vector 1 will denote the vector of all ones in R™. We now state a condition on real- valued functions on our 
state space. 

Condition 2.1 A function f : Nq — > R satisfies this condition if there exist constants C, r > such that 

1/0)1 < c ( l + ND /or a« x 6 No- 
We now state a condition on the propensity functions. 

Condition 2.2 We say that the propensity functions Ax, ■ ■ ■ , Ax satisfy this condition at the parameter value 
6, if the following is true. 

(A) For any fixed x € Nq, each Afe(x, ■) is twice- continuously differ entiable in a neighborhood of 9. 

(B) For each k, the functions Xk(-,0) and d\k(-,0)/d9 satisfy Condition 2.1. Moreover there exists an 
e > such that sup^ e ^ s _ e 9+e - ) \d 2 Xk{-,£,)/d6 2 \ also satisfies Condition 2.1. 

(C) For any k and x € Nq, if Xk(x,9) > then the vector (x + £&) has all non-negative components. 

(D) Let P be the set of indices of those reactions which have a net positive affect on the total population. 
That is 

P={A; = l,2,...,^:(T,a>>0}. (2.6) 
Then there exists a C\ > such that for all x £ Nq we have 

K 

]T X k (x,0) < C x (l + \\x\\). 

k=i,keP 

Parts (A) and (B) are technical requirements for our main result. Part (C) prevents the Markov process 
from leaving the state space Nq. Part (D) is needed to ensure that all the moments of the Markov process 
can be bounded uniformly in time (see Lemma 4.1). Informally, this condition says that all the reactions 
that add molecules into the system have orders or 1. If there is a compact set S such that for each k, 
Xk(x, 9) = for all x S, then part (D) is trivially satisfied. 

If {Xg(t) : t > 0} is a Markov process with generator Ag then let 

*g( X , f,t) =E (f(Xg(t))\Xg(0) = x) . (2.7) 

Also for each k = 1 , . . . , K define 

R e (x, /, t, k) = [ (* e (x + CkJ, s) - * e (x, /, s) - A ( J(x)) e-^ e ^ds. (2.8) 
Jo 

We are now ready to state our main result. It gives an explicit expression for Sg(f,T) which is defined by 
(1.1). 
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Theorem 2.3 Pick a x € N$ and a function f : Ng — > R satisfying Condition 2.1. Assume that the 
propensity functions Ai,...,Ajf satisfy Condition 2.2 at 9. Let {Xg(t) : t > 0} be the NQ-valued Markov 
process with generator Ag starting at xq and let o~i be its i-th jump time 2 for i = 0,1,2,.... Then the 
sensitivity value Sg{f,T) is well-defined and is equal to 

Se(f, T) = QQ* e (x J, T)=E (s (f, T)) 



where 

= E [[ ^f'^ ] A Ck f(Xe(t))dt (2.9) 

+ £ aM y'\ w4/,r-^,fc)). 

i=0:ai<T ) 

The proof of this theorem is given in Section 4. Note that the random variable sg(f, T) can be evaluated 
from the realizations of the Markov process {Xg(t) : t > 0}, provided that we can calculate the values of the 
function Rg that are required. In almost all the cases these values cannot be calculated explicitly and they 
have to be estimated during the simulation run. We shall deal with this issue in detail in the next section. 
For now assume that we "know" the function Rg and hence using independent realizations of {Xg(t) : t > 0} 
we can generate N independent samples s < g\f,T), . . . ,Sg(f,T) from the distribution of sg(f,T). Then 
Sg (/, T) can be estimated as 

1 N 

Sg(LT) = -J2 s e ] (f,T) (2.10) 

i=i 

and the variance of this estimator is 

Var(&(/,T)) - Var ^5>S°(/. T )1 = ^Var(* 9 (/,T)). (2.11) 

Generally statistical quantities are estimated along with a confidence interval which indicates the accu- 
racy of the estimate. The half-length of the 95% confidence interval for the above estimator is given by 



1.96y Vax(S$(f, T)) — 1.96 y/Vax{sg(J, T))/\/~N. Since the variance of sg(f,T) is unknown we can substi- 
tute it by the sample variance v(s$(f,T)) evaluated as 



N 



v(sg(f,T)) = J ±^J2^\f,T) - Sg(f,T)) 2 . (2.12) 

i=l 

Note that the number of samples (N) needed for estimating Sg(f,T) within a certain confidence interval, is 
directly proportional to the variance of sg(f,T). 

Now we briefly discuss the Girsanov method presented in [19] and the problems associated with it. 
Suppose that the sensitive parameter 9 appears linearly in only one propensity function Afe for some fco € 
{1, . . .,K}. In this case, it is shown in [19] that Sg(f,T) = E(sg(f,T)) with sg(f,T) given by 

sMn = fmxMtn t (2 . 13) 

where Mg°(t) = (Ng°(t) — J* Xk (Xg(s),9)ds) is a martingale and Ng°(t) is the number of times reaction 
kg fired until time t. Observe that this formula cannot be used to determine Sg{f,T) for 9 = even though 
Sg(f,T) (as defined by (1.1)) makes perfect sense at 9 = 0. Moreover, as the next example illustrates, the 
random variable sg(f, T) (given by (2.13)) can have a large variance for small values of 9 making it necessary 
to take very large sample sizes for estimating accurately. 



2 We define ctq = for convenience. 
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Example 2.4 Consider a pure birth process in which a chemical species S is created at rate 9 



As. 

The population at time t is given by Xg(t) £ Nq. We assume that Xg(0) — 0. There is only reaction, 
with propensity function X\(x, 6) = 9 and stoichiometric vector d = 1. For this system we can write 
Xg(t) = Y(8t) where Y is a unit rate Poisson process. Let / : No —> K be given by f(x) — x. Then for any 
T > 

S e (f,T) = ±E(f(X e (T))) = ±E{X 9 (T)) = ^E(Y(0T)) = ^ = T. (2.14) 

The last equality follows from the fact that Y(9T) is a Poisson random variable with rate 9T. According to 
the Girsanov method, Sg(f,T) = E(sg(f, T)) where sg(f,T) can be evaluated using (2.13) as 

s e (f,T) = ±Y 1 (6T) (Y 1 (9T)-9T). 

Using the moments of Poisson random variables one can verify that as expected, E(sg(f, T)) = T, but its 
variance is 

Var(s e (/,T)) = . 

This shows that for 9 w 0, Var (sg(f, T)) is very high and hence the Girsanov estimator will perform poorly. 
Also for 9 = we cannot determine S$(f, T) with this method even though Sg(f, T) is well-defined, as shown 
by (2.14). 

Recall the definition of ^g and Rg from (2.7) and (2.8). For this example, ^g(x, f, t) = x + 9t and hence 
Rg{x, f, t,l) = for all x and t. Therefore Theorem 2.3 says that Sg(f, T) = E (s e (f, T)) with s g {f, T) = T 
(see (2.9)). Hence the estimator based on Theorem 2.3 has variance ! Moreover this estimator works for 
9 = 0, unlike the Girsanov method. □ 

Note that for small values of 8, the reaction in Example 2.4 will have very few firings in the observation 
time period [0,T]. Hence this example indicates that if the sensitive parameter 9 is the rate constant of a 
reaction which is slow in the reference time-scale of the system, then the Girsanov method can be highly 
inefficient, while an estimator based on Theorem 2.3 can perform much better. We illustrate this through 
another example. 

Example 2.5 (Single-species birth-death model) : Consider the process in which a chemical species 
iS is created and destroyed according to the following two reactions: 

0^5 A 0. 

The population at time t is given by Xg(t) £ Nq. Conditioned on Xg(t) = x, the propensity functions for 
the first and second reactions are Xi(x,9) = 1 and \2{x, 9) = 9x respectively. We assume that Xg(0) = 0. 
Let fix) = x, then 

*g(x, f,t) =E(Xg(t)\Xg(0) = x) = xe- dt + - e- et ). 

a 



This allows us to compute 



rt / *| p -{l+6x)t\ / -8t _ -(l+6x)t\ 

«,(,,/.«.»> -j[ (i- e -«>-<«.K.-.),,= (L^_)_(l_^_). (2 , 5) 

According to Theorem 2.3 we have Sg(f,T) = E(sg(f,T)) where 

sg(f,T) = - X e {t)dt+ ]T XoipdRoiXoia&fiT-ai,*). (2.16) 

"'° i=0:<n<T 



G 



Tabic 1 : Comparison of sample variances 
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Method 


T 


1 
1 


O 


1U 


on 


0.1 


G irs3.no v 
Our 


10.7365 
0.2905 


2303.39 
20.473 


20698 
90.8017 


112758 
326.391 


0.01 


Girsanov 
Our 


99.6366 
0.3343 


33719.1 
37.6357 


489925 
281.547 


6.6203 x 10 6 
1923.5 


0.001 


Girsanov 
Our 


302.818 
0.3447 


373393 
41.6996 


5.76119 x 10 y 
329.948 


7.81587 x 10 Y 
2519.29 


0.0001 


Girsanov 
Our 


10004.2 
0.3364 


3.85596 x 10 b 
41.1659 


6.50532 x 10 Y 
334.106 


7.99393 x 10 B 
2620.81 



Equation (2.15) gives us the explicit formula for Rg. With this in our hands, we can generate samples from 
the distribution of sg(f,T) and compute the sample variance v(sg(f,T)) (see (2.12)). We can do the same 
for sg(f,T) given by the Girsanov method (see (2.13)). In Table 1 we present a comparison of the sample 
variances obtained by both these methods for 9 = {0.1,0.01,0.001,0.0001} and T = {1,5,10,20}. These 
results allow us to make the following observations. For all the values of 9 and T that are considered, our 
method gives a much lower sample variance than the Girsanov method. Moreover the variances obtained 
by the Girsanov method increase significantly as 9 decreases in size, while for our method they remain the 
same. We discussed before that the efficiency of an estimator is inversely proportional to its variance. The 
results in Table 1 reinforce our claim about the inefficiency of the Girsanov method while estimating the 
sensitivity with respect to the rate constants of slow reactions. In such situations, the estimator based on 
Theorem 2.3 can be more efficient. □ 



3 Algorithm for estimation of the parameter sensitivity 

In Example 2.5, we had an analytical expression for the function Rg that allowed us to generate samples 
of sg(f,T) using (2.9). For most examples of interest, we would not have this luxury. Therefore we need 
to find a way to estimate the required values of the function Rg "on the run". We deal with this issue 
now and present an algorithm to generate samples for estimating the parameter sensitivity. Note that the 
method we propose is just one way to estimate the sensitivity using Theorem 2.3. There could be better 
approaches, perhaps using smarter data structures, that can do the job more efficiently. We encourage the 
research community to explore this issue. 

Let {Xg(t) : t > 0} be a Markov process with generator Kg and initial state xq. Its jump times are given 
by do, cti, . . . , where <r = 0. Our goal is to estimate the quantities of the form Rg(Xg((Ti), f,T — at, k) that 
appear in (2.9) and then evaluate sg{f, T). Moreover to preserve the unbiasedness of our method, we require 
these estimates to be unbiased as well. Estimating several such quantities in parallel is quite challenging 
and hence the algorithm we are about to present is much harder to implement than the Girsanov method. 
However once implemented, our algorithm can offer considerable speed-ups over the Girsanov method in 
certain situations. We shall demonstrate this later through examples. We first make the formula (2.9) more 
amenable for calculations. 



Define 



r) = max{i > : o % < T] (3.17) 



At , (^i-^) fori = .,*/-! (31g) 



and let 



(T — er r) ) for i = r\. 

Then 

d\ k (Xg(t),9) ^aXfcpr^)^) 



j = ± \J(M«,))M,. (3.19) 
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From (2.8) we know that evaluating Rg requires us to compute exponentially weighted integrals, which can 
be cumbersome. To avoid this issue we do the following. If Xo(Xg(ai),9) > 0, then let 7 , be an exponential 
random variable with rate Xo(Xg(ai), 9) and let 

ati= (T-ai- 7i ) + . (3.20) 

Conditioned on Ao(-X^(oi), 9), the random variable 7 j is independent of everything else. Then for any £ € Nq 



3 



T-c 



= E(^ e {X e (a i ) + C,f,a i )\Xg(a i ),a i ). 
Hence for any k = 1, . . . , K we have 



Rg(X e (<Ti), f,T - (Ti,k) 



\ {Xe(ai),6) 
E (^g(Xg(ai), /, ai)\Xg(ai), a t ) - \J(X (tT t )) 



E (V e (X e (ai) + Cfc, /, ai)|^e(<Ti),<7i) 



(3.21) 



On the other hand if Ao(X6/((7i), 9) = 0, then Xe(CTj) is an absorbing state for the Markov proces, which 
implies that ^g(X {(T i ), f,t) = f(X e (a t )) for all t > 0. Therefore 



where 



Rg(Xg(a z )J, T - ai, k) = IeiXgia,) + ( k ,f, T - at) - (T - <Ti)f{X e (<Ti) + Cfc), 



I e (x,f,t) = / V e (x,f,s)ds. 



(3.22) 



(3.23) 



Let ^g(Xg{<7i)+C, k , f, a l ),^g(Xg(a l ), /, ai) and Ig(Xg(ai)+( k , f, T—ai) be unbiased estimators for \fr e (X e ((7 i )+ 
Cfc, /, a^, *e(X fl (cr i ), /, and Ig(Xg(<Ti) + ( k , /, T-crj) respectively. Assume that given X e (a l ) and a i: these 
estimators are independent of the sigma field F ai , where {Ft} is the filtration generated by {Xg(t) : t > 0}. 
If we dchne 

Rg{Xg((Ji) 1 f,T - (Ji,k) 

( ^e{X e {a i )+C. k J, az )-^ B {Xe{ai),f,a. l )-^ ik f{X B (a l )) , . , , , 

= ) x (x e (v z ),e) h lf MXe{Vi),v) > 

\ Tg(Xg(ai) + C k ,f,T-ai)-(T-ai)f(Xg(ai)+Ck) otherwise , 

then due to (3.21) and (3.22) we must have 

Re{X d (a i )J,T-(Xi,k)=E(R 9 (Xg{a i ),f,T-a i ,k)\Xg(a i ),a i ' S j . (3.24) 

In other words, given Xg(ai) and ai, Rg(Xg(ai), f,T — ai,k) is an unbiased estimator for Rg(Xg(ai), /, T — 
<7i,k). Observe that if Ao {Xg {ai ), 9) = for some i, then Xg{ai) is an absorbing state and hence the next 
jump time cr^+i = 00. Therefore for i = 0, . . . , r\ — 1, Xg(ai) can never be an absorbing state. 
Define sg(/, T) as 



A" 



s e (f,T) = Yl 

k=l 

v -l 

+E 



■77-1 

E 

.1=0 



d\ k (X B (<ri),0) 



m 



A Ck f(Xg(a x )) At 



dX k (X (ai),9) 



\ Q (Xg{ai),9) 



09 



\ (Xg(a t ),9) 
§g(Xg{ai)+( k ,f, ai ) - %g{Xg{ai), f,a t ) 



(3.25) 



d\ k (Xg(a v ),6) t(v t n x\ 1 aj 
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where 

f 1 if Ao(X fl ((T,),e) > 
I otherwise . 



Since s g (f,T) is given by (2.9), the relations (3.19) and (3.24) imply that E(sg(f,T)) = E{sg(f,T)). Hence 
due to Theorem 2.3 we have 



S e (f,T) = E(Mf,T)). (3.26) 

which shows that we can estimate Sg (/, T) by computing the sample mean of several independent realizations 
of the random variable Sg(/, T). Now we outline an algorithm to obtain such realizations. 

Observe that the evaluation ofsg(f, T) requires us to estimate either ^fg(X$(ai)+(k, f, cti) an d ^g(Xg(ai), f, on) 
or Ig(Xg(ai) + Cfc, f,T — (Ji) for many values of i and k. These quantities can be estimated using several 
paths of the Markov process with generator Ag and initial states Xg{ai) + Ck or Xgfa). If we generate such 
paths independently for each i and k, then the problem quickly becomes computationally intractable. So we 
need another way. Generally a Markov process representing chemical kinetics is such that its various paths 
visit the same states again and again. We can use this observation to our advantage and try to estimate 
all the required quantities with the same set of paths. For this purpose, we do the following. In addition 
to the base path of the Markov process we also independently generate M auxiliary paths with the same 
initial state. From the base path we determine what quantities need to be estimated and with the help of 
the auxiliary paths we carry out the estimation. Note that if the state x is visited by m paths X\ , . . . , X m 
at times tx,...,t m , then ^>g(x, /, t) and Ig{x, /, t) can be estimated as 

■i m m „t 

9 (xJ,t) = —Y i f(X i (t i + t))fuidTe(x,f,t) = —Y i / /(XiiU + s^ds, (3.27) 

If there is no path that reaches x then ^g(x, f,t) or Ig(x, f,t) cannot be estimated this way. In this case, 
we estimate such a quantity using a single independently generated path. Once all the required quantities 
are estimated we can obtain a realization of se(/, T) using (3.25). 
A rough sketch of the method we just described is given below. 

1. Generate a base path of the Markov process. 

• Store all the quantities in (3.25) that can be directly calculated from the base path. 

• Create a list C with all the quantities of the form ^fg(x, /, t) or Ig(x, /, t) that need to be estimated. 

2. Generate M auxiliary paths of the Markov process. 

• Use these paths to estimate the quantitites in C. 

• If a quantity cannot be estimated with these paths, then estimate it with a single independently 
generated path. 

3. Use the expression (3.25) to evaluate s$(f, T). 

All the paths can be generated using Gillespie's Stochastic Simulation Algorithm [10]. Since our method relies 
on the process visiting the same states again and again, we simulate the paths until time kT (rather than 
T), where k > 1 is an extension factor. The parameters M and k can be used to fine-tune the performance 
of our method. Note that our method would produce an unbiased estimate of the parameter sensitivity 
irrespective of the choice of M and k. 

In implementing the method sketched above, one has to deal with many issues. It is important to store 
all the information properly and perform all the book-keeping that is necessary to estimate the quantitites 
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in C. The efficiency of our method depends crucially on how we store the list C. It has to be done in such 
a way so that when a path being simulated reaches a state x, one can easily determine if a quantity of the 
form ^e{x, f, t) or Ig(x, /, t) is inside C. In our implementation we use a Hashtable (see [4]) for this purpose, 
where the hashing function maps the states in Nq to the indices of a large array. The full details of our 
implementation are contained in [12]. 

From now on we shall refer to the method outlined above as APA, which is an acronym for the Aux- 
iliary Path Algorithm. Even though this method is designed to compute the sensitivity with respect to a 
single scalar parameter 8, we can easily extend it to compute sensitivity with respect to many parameters 
simultaneously. Even though APA is harder to program than the Girsanov method and has larger memory 
requirements 3 , it can be very useful in certain situations. We illustrate this through a couple of examples. 
For both the examples, we set M (number of auxiliary paths) to 50 and k (extension factor) to 3. In all 
our numerical examples, we will estimate the sensitivity Se(f,T) using the minimum sample size N that is 
needed to ensure that the half-length of the 95% confidence interval is below 5% of \Sg(f,T)\, where | • | is 
the absolute value function. In our results, the sensitivity estimate Se(f, T) will be written in the form s ± I 
which means that the 95% confidence interval is equal to [s — I, s + 1] . While presenting the results we always 
indicate the CPU time 4 (in seconds) that was required for the estimation. The CPU time can be taken as a 
measure of the efficiency of a given method. 

Example 3.1 (Single-species birth-death model) : Let us revisit Example 2.5 of a simple birth-death 
process. In that example we "cheated" in the sense that we used an exact expression for the function Rg (see 
(2.15)) to generate the samples for sensitivity estimation. Hence we did not have to go through the complex 
estimation procedure that is required by APA. To compare the performance of the Girsanov method and 
APA for this example, we present the results for sensitivity estimation obtained by both these methods in 
Table 2. These results arc provided for 6 = {0.1, 0.01, 0.001, 0.0001} and T = {5, 10}. 



Table 2: Comparison of results for the Birth-Death model 



8 


T 


Method 


Se(f,T) 


N 


CPU time (s) 


0.1 


5 


Girsanov 
APA 


-8.9671 ± 0.4483 
-8.8021 ± 0.4400 


46271 

448 


0.0449 
0.1094 


10 


Girsanov 
APA 


-25.7869 ± 1.2893 
-26.5226 ± 1.3231 


45885 
270 


0.0851 
0.1579 


0.01 


5 


Girsanov 
APA 


12.1866 ±0.6093 
-12.2895 ±0.6142 


370157 

384 


0.2816 
0.0716 


10 


Girsanov 
APA 


-47.3787 ±2.3689 
-46.5443 ± 2.3234 


334872 
231 


0.4460 
0.0895 


0.001 


5 


Girsanov 
APA 


-12.5555 ±0.6278 
-12.4529 ±0.6218 


3.62 x 10 s 
391 


2.7242 
0.0701 


10 


Girsanov 
APA 


-49.132 ±2.4566 
-50.823 ±2.5409 


3.46 x 10 b 
223 


4.5133 
0.0785 


0.0001 


5 


Girsanov 
APA 


-12.7847 ±0.6392 
-12.6859 ±0.6333 


3.50 x 10' 
449 


25.8511 
0.0773 


10 


Girsanov 
APA 


-50.6981 ± 2.5349 
-50.0198 ±2.4969 


3.28 x 10 Y 
249 


41.9501 
0.0847 



From Table 2 we can make the following observations. Unlike the Girsanov method, the performance 
of APA stays the same as 8 decreases in magnitude. APA is slightly slower than the Girsanov method for 
8 = 0.1, but faster for all other values of 8. In fact for 8 = 0.001 and 8 = 0.0001, APA is more efficient than 
the Girsanov method by a factor of more than 10 and 100 respectively. □ 

3 APA requires memory of a size which scales linearly with the total number of jumps in the Markov process within the time 
period [0, kT]. See [12] for details. 

4 A11 the computations in this paper were performed using CH — h programs on an Apple machine with a 2.2 GHz Intel i7 
processor. 
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Example 3.2 (Gene Expression Network) : We now consider the model for gene transcription and 
translation that appeared in [24]. It has three species : Gene (G), mRNA (M) and protein (P), and there 
are four reactions given by 

G^G + M, MHM + P, M^0 andP^0. 

The first two reactions represent the translation of a single gene into mRNA and the transcription of mRNA 
into proteins. The final two reactions are degradation of mRNA and protein molecules. The rate constants for 
translation, transcription, mRNA degradation and protein degradation are fcp, fcp, 7^ and 7p respectively. 
Typically, a mRNA molecule decays a lot faster than a protein molecule. The half-life of the former is usually 
in minutes (or seconds) while the half- life of the latter is usually in hours. We shall fix fcp = 0.6 min , 
kp = 1.7329 min - and 7^ = 0.3466 min -1 . These values are given in [24] for lacA gene in E.Coli. Our 
sensitive parameter 9 is the protein degradation rate 7p. 

The state of this system at time t is Xg{t) = (Xg^(t), Xg^it)), where X$ and Xg^(t) are the number 
of mRNA and protein molecules respectively. We assume that (Ag^O), Ag j2 (0)) = (0, 0). Define / : Nq — > R 
by f(xi,X2) = X2- We would like to estimate 

So(f, T) = (f(Xg(T))) = ^E(X e . 2 (T)). (3.28) 

We consider five values of 9 : 0.0693 min -1 , 0.0116 min" 1 , 0.0023 min -1 , 0.0012 min" 1 and 0. These 
correspond to the protein half-life of 10 min, 1 hr, 5 hr, 10 hr and 00. In Table 3 wc estimate the sensitivity 
(3.28) using APA and the Girsanov method for two values of T : 5 min and 10 min. Of course for 9 = 0, the 
Girsanov method cannot be applied and so the values are only given for APA. The results in Table 3 have 



Tabic 3: Comparison of results for the Gene Expression Network 



9 


T 


Method 


So 


N 


CPU time (s) 


0.0693 


5 


Girsanov 
APA 


-12.1080 ± 0.6054 
-12.2757 ±0.6136 


331945 
1913 


0.6330 
1.1284 


10 


Girsanov 
APA 


-61.3473 ± 3.0670 
-61.2132 ± 3.0589 


253048 
1277 


1.2661 
1.9463 


0.0116 


5 


Girsanov 
APA 


-13.878 ±0.6939 
-13.821 ±0.6908 


1975557 
1943 


3.5028 
0.9470 


10 


Girsanov 
APA 


-80.8423 ±4.0420 
-82.7886 ±4.1375 


1554726 
1664 


6.7256 
1.9465 


0.0023 


5 


Girsanov 
APA 


-14.6221 ±0.7311 
-14.8203 ±0.7410 


9406555 
2384 


16.259 
1.1224 


10 


Girsanov 
APA 


-89.1083 ±4.4554 
-86.6336 ±4.3314 


7102591 
2236 


29.4725 
2.3528 


0.0012 


5 


Girsanov 
APA 


-14.9095 ± 0.7455 
-14.9071 ±0.7452 


17378930 
12047 


30.0565 
0.9351 


10 


Girsanov 
APA 


-88.5277 ±4.4264 
-86.4873 ±4.3242 


13929778 
1919 


57.6593 
1.9933 





5 


APA 


-15.037 ±0.7518 


1935 


0.8865 


10 


APA 


-83.5049 ±4.1741 


1797 


1.8591 



the same underlying message as in Example 3.1. The performance of the Girsanov method deteriorates as 9 
gets smaller, while the performance of APA remains unchanged. Apart from the biologically unrealistic case 
of 9 = 0.0693 min" 1 (corresponding to protein half- life of 10 min), APA outperforms the Girsanov method 
in all other cases. The degree of outpcrformance is directly proportional to the smallncss of 9. □ 

Examples 3.1 and 3.2 clearly indicate that APA can be much more efficient than the Girsanov method 
for the estimation of sensitivity with respect to the rate constant of a slow reaction in a chemical reaction 
network. 
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With so many methods in the literature for estimating the parameter sensitivity, one may wonder which 
method should be used in a certain situation. We now offer some suggestions. In general if one is willing 
to tolerate a bias in the sensitivity estimate, then the finite difference schemes given in [20, 1] can be 
much faster than both the unbiased methods (APA and Girsanov). Such schemes estimate the quantity 
Se,h(f,T) (see (1.2)), by coupling the processes Xg and Xg + h in an intelligent way. Three such couplings 
are : Common Reaction Numbers (CRN) (see [20]) , Common Reaction Paths (CRP) (see [20]) and Coupled 
Finite Differences (CFD) (see [1]). The bias introduced by these schemes is proportional to h and one may 
be want to reduce the bias by picking a very small h. However as h gets smaller, the variance of the finite 
difference estimator gets larger, making these schemes inefficient in comparison to the unbiased methods. 
To illustrate this point we return to Example 3.2 of the gene expression network. We fix 9 = 0.0116 min 
and T = 10 min. The sensitivity value in Example 3.2 is now estimated using the three finite difference 
schemes : CRN, CRP and CFD, for four different values of h : 10 -2 , 10 -3 , 10 -4 and 10 -5 . We perform these 
estimations using the SPSens software [21]. The results 2 are given in Table 4. One can easily verify that 
the efficiency of each finite difference scheme deteriorates as h gets smaller. From Table 3 we see that for 
this choice of 9 and T, the Girsanov method and APA estimated the sensitivity in 6.7256 sec and 1.9465 sec 
respectively. For higher values of h, the finite difference schemes (especially CRP and CFD) are much faster 
than both these methods. However the situation is reversed for the smallest value h = 10 -5 . Based on the 



Table 4: Efficiency of various finite difference schemes 



h 


Method 


Seji 


N 


CPU Time (s) 


io- 2 


CRN 
CRP 
CFD 


-82.6147 ±4.1478 
-81.2400 ±4.0207 
-80. 1277 ±4.0287 


47500 
2500 
2350 


0.4957 
0.0312 
0.0229 


10" 3 


CRN 
CRP 
CFD 


-81.9867 ±4.0577 
-81.2000 ±4.0314 
-81.7778 ±4.1345 


600000 
20000 
18000 


6.4498 
0.2194 
0.1498 


10~ 4 


CRN 
CRP 
CFD 


-80.1203 ±4.0080 
-83.3846 ±4.0611 
-81.7895 ±4.0499 


6400000 
195000 
190000 


68.2093 
2.1016 
1.5559 


10~ 5 


CRN 
CRP 
CFD 


-83.0656 ±4.0350 
-82.1538 ±4.0239 
-83.6316 ±4.1103 


64000000 
1950000 
1900000 


724.4415 
21.0040 
15.6029 



this discussion, we suggest the following strategy for picking the best method for estimating the sensitivity 
with respect to the rate constant 9 of a particular reaction. If some amount of unknown bias can be tolerated, 
then a finite difference scheme (preferably CRP or CFD) must be used. However if one does not want to 
sacrifice the accuracy of the estimate, then instead of trying to reduce the bias by picking a very small h in a 
finite difference scheme, one should use an unbiased method : APA or the Girsanov method. The results in 
this paper show that APA would be a better choice for small values of 9 while the Girsanov method would 
be more appropriate for large values of 9. This strategy for method selection is heuristically depicted in 
Figure 1. 



4 Proof of the main result 

In this section we will prove our main result Theorem 2.3. We start by proving a simple lemma. 

Lemma 4.1 Let {Xg{i) : t > 0} be a Markov process with generator Ag given by (2.3). Suppose that the 
propensity functions Ai, . . . , Xk satisfy Condition 2.2 at 9. Assume that E (||Ag(0)|j p ) < oo for all p > 0. 
Then we have the following. 

2 In this paper, we always estimate the parameter sensitivity using the minimum number of samples N that are needed to 
ensure that the half-length of the 95% confidence interval is below 5% of the magnitude of the sensitivity estimate. The current 
version of the software SPSens does not allow us to specify such a stopping criterion. Hence by trial and error, we estimate the 
sensitivity using a sample size N which approximately satisfies this criterion. 
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Finite Difference Scheme 
CRP/CFD 



Parameter Size 



Figure 1: Strategy for picking the sensitivity estimation method based on the size of the sensitive parameter 
and the bias that can be allowed. 



(A) For any p > and T > 



Sup E(\\Xg(t)\\ P ) < 00. 

te[o,T] 



(B) If f : Nq — > M is a function satisfying Condition 2.1 then for any t > 

E (/(X e (t))) = E (/(Xe(0))) + E Agf(X e (s))ds 

Proof. To prove part (A) of the lemma we can assume that p is a positive integer. Recall the definition of 
the set P from (2.6). Let 



max fceF (l,Cfc) ifP^0 
otherwise. 



Pick a large M > and let g : Nq — > R be the function given by 



(4.29) 



g(x) = \\xrA(M + q y, 

where a A b := min{a, b}. Since g is bounded, it is in the domain of the generator Ag. Define a stopping time 
t~m by 

r M = inf{t>0:||X 9 (t)||>M}. 
Using Dynkin's theorem (see Lemma 19.21 in [14]) we can conclude that 



tATAJ 



tATM 



E (g(Xg{t A t m ))) = E (g(X e (0))) + E 
= E(g(X e (0)))+M 
For < t < tm, we have ||-Xg(f)|| < M, which implies that 

E(\\Xg(tAT M W)=^(\\MW) 

+ E 



A g g(Xg(s))ds 



K 



J2Wg(s),6)A Ck g(Xg( S ))ds 



k=l 



/•tATM K \ 

J J2 x k(Xe(s),9)(\\Xe(s) + a\\ p -\\Xe(s)\\ p )dsj . 
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Part (C) of Condition 2.2 ensures that when X k (Xg(s), 6) > 0, (X g (s) + ( k ) € Ng and hence \\Xg(s) + ( k \\ = 
(T,X e (s)) + (T,Cfe). This gives us 

E(\\X g (tAr M )\\P) 

(ptATM \ 

/ ^ A fe (X e ( S ),0) + COf - <1,^(*)> P ) ds 
Jo keP J 

< E (\\X e (0)\\P) + 2T<fE U* J2 HMs), 0) {\\X e {s A r M )ir 1 + l) <hj . 
Using part (D) of Condition 2.2 we can find a constant C > such that 

E(\\Xe(tAT M )\\P))<E(\\Xg(0)\n+Ct + C f E (\\Xg(s A tm)\\ p ) ds. 

Jo 

From Gronwall's inequality we obtain 

E (\\Xg(t A tm)\\ p )) < + Ct) e ct . (4.30) 

Using Markov's inequality, for any t > we get 

lim ¥(T M <t)= lim F(\\X e (tAT M )F>MP)< lim ^^=0. 

The last limit is due to (4.30). The above calculation shows that tm — > oo, in probability as M — > oo. 
Since tm is monotonically increasing we must have that tm — > oo a.s. as M — ^ oo. Letting M — >• oo in 
(4.30) and using Fatou's lemma we obtain 



ct 



E(\\X e (t)\\ P )) < I™ E (\\X e (t A T M )m) < (E(\\X e (0)\\ p ) + Ct) e 

M— >oo 

Taking supremum over t € [0, T] proves part (A) of the lemma. For part (B), note that if / satisfies 
Condition 2.1 then due to part (A) we must have that 

E(\f(X e (t))\) < oo and E (J \Agf(X e (s))\ds^ < oo. 

We can assume that / is a positive function. Pick a large M and define a bounded function f M : Nq — > E 
by 

f M (x) = f(x)AM. 

The function /m is bounded and using Dynkin's theorem (see Lemma 19.21 in [14]) we get 
E (f M (X e (t))) = E (f M (X e (0))) + E ( K J o A e f M (Xg(s))ds 

Taking the limit M — > oo and using the dominated convergence theorem proves part (B) of the lemma. □ 

Note that the state space of our Markov process is Nq. Endowing this space with the discrete metric 
makes it a complete and separable metric space. Hence under our topology, any real-valued function on Nq 
is continuous. We now come to the main proof. 

Proof. [Theorem 2.3] Let Xg be the process given by (2.4), with Xg(0) = xq. Pick a h <G R and for any 
X\ , X2 & Ng define 

X k nin (x 1 ,X2,6,h) = \ k (x 1 ,6)A\ k (x 2 ,9 + h) for fee {1,2,..., K} 
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and let \™ m (xi, x%, 9, ft) = Y^,k=i ^T m ( Xl ' X2 ' ®-> Let the processes Xg and Xg + h be given by the following 
time change representations. 

X e (t)=x + J2aY k Qf Xr n (Xe( S ),Xg +h ( S ),9,h)ds^ (4.31) 

K / rt 

' X k (X e (s),9) - \r n (X e (s), X g+h (s), 9, ft) ) ds 



+E^ (1) (/ 
fc=i VJ ° 

%(t) = io + ^(^ ^j\ k nin (X (s),X e+h (s),e,h)ds^ (4.32) 

where {Yfe, K, : k = 1, 2, . . . , K} is a family of independent unit rate Poisson processes. The Yfc's here 
is the same as those in (2.4). These representations define a coupling that was used in [1] to construct efficient 
finite difference estimators for the parameter sensitivity. Note that Xg and Xg + h are Markov processes with 
generators Ag and Ag + h respectively. They both start with the same state xo- Since X k nm (x, x, 9, ft) -4 
Xk(x,0) as ft 0, we must have that Xg and Xg+h converges almost surely to Xg as ft — >■ 0. This 
convergence is in the Skorohod topology on the space Nq. For details on this topology see Chapter 3 in [7]. 
By definition 



E(/(X e+h (T))J -E(/(X e (T)) 
ft 

Recall the definition of the generator Ag from (2.3). Part (B) of Lemma 4.1 gives us the following relationships 



Sg(f, T) = lim — * '- * '-. (4.33) 

h— >0 ft 



E(/(l e (T))) = /(*<)) +E E (/ A fe (X e (i),0)A a /(X e (t))cftJ and 
(f(X 6+h (T))) = /K) + E E (/ A ^ 



(4.34) 

E(/(X fl+A (T))) = /(ar ) + VEl A fc (X e+ ,i(t), 6» + h)A ik f(X 0+h (t))dt I . (4.35) 



Part (A) of Condition 2.2 allows us to write the Taylor expansion below 

Afc(.T,# + ft) = Afc(.T,#)+ft — + y ^ : 

where £ G (0, 9 + ft). Substituting this expansion in (4.35) we get 

rT 



E (f($ g+h (T)j) = /(x ) + E A e /(X e+/l (i))diJ 

+ /*E E (/ T ^%^A a /(X 9+ , W )^ 



Using (4.34) and (4.33) we obtain 

S e (f,T) = lim Ie { f (A*/(X fl+/l (t)) - A e /(A > 8 (t))) 
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Note that part (B) of Condition 2.2 says that and dXk/09 satisfy Condition 2.1. Moreover by part (C) 
of Condition 2.2, for some e > 0, sup g , £ ^ g _ e g+ ^ \d 2 Xk(-,0')/d9 2 \ also satisfies Condition 2.1. Since both 

Xg and X g+ h converge to Xg almost surely as h — > 0, we can use part (A) of Lemma 4.1 along with the 
dominated convergence theorem to conclude that the third limit on the right of the above expression is 
and the second limit is equal to 



fe=i 



Therefore 



S e (f,T) = f> (jf dXk{X d f' d) A Ck f(X e (t))dt j +P (9) (4.36) 



where 



P {6) = lim ~E (jf (A e /(l e+/l (i)) - Kgf(X e (t)j) dtj . (4.37) 
For each k = 1 , . . . , if let 

r£ =inf jf > : y, (1) (7 (A fc (X fl (a),fl) - \T m (Ms), X e+h (s), 6,h)) ds 
+Yl 2) ( f (Xk(X e+h (s),0 + h) - Xf in (X e (s),X e+h (s),e,h)) 



ds > 1 



Then the first time Xg and Xg+h have a different state is given by 



r' l =minr^. (4.38) 

k 



Almost sure convergence of Xq and Xg+h to Xg implies that r h — > oo as h — > 0. We can write 

p(0)=Hm±E(jf h (A e f(X e+h (t)) - Agf(X e (t))) dt \ =limEE^W' ( 4 ' 39 ) 



h— >0 /l \ JxAr h 

where 



= E (l K<T , <<T , +i)T , =r , } jf ^ (A e /(X e+h (t)) - A«/(X fl (*))) dt] (4.40) 



and erf is the i-th jump time of the process Xg (with ctq =0). Let {J-j?} be the filtration generated by the 
processes Xg and Xg + h- Note that since Poisson processes are strongly Markov, the processes Xg and Xg + h 
are also strongly Markov. Conditioning with respect to •7 7 tat'» an d using part (B) of Lemma 4.1 along with 
the strong Markov property we get 



pf k (0) = E ( ]l K , <Th< ^ +iiTh=r , } E ( J ^ (kgf{Xg +h (t)) - Agf(Xg(t))) d 



AT 
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E 



(V?<T'<<i,' J, =rJ} E ( D f( x e+h,T A T h ,r) - Df(Xe,T A r h ,T) J T ). AT jj , 



where D f (X,s,t) = f(X(t)) - f(X(s)). Let 7^ = (r h - erf) A (of +1 - erf). Given t' 1 > erf and X e (of) = a;, 
is exponentially distributed with rate 

r(x, 9, h) = X Q (x, 9) + X (x, 9 + h) - X^ m (x, x, 9, h) 

and the probability of the event E^ k = {r h < crf +1 , t h = rj}} = {r h = erf + 7^, r h = t£} is given by 

X k (x, 9) + X k (x, 9 + h)- 2X k n ™{x, x, 9, h) 



Using Taylor's theorem we can write 



r(x, 9, h) 



r(x, 9, h) 
dX k (x,9) 



h + lUh,x)h), 



where l e ik {h : x) — > as h — >• 0. Given erf < r h < T and Ag(crf ) = x, on the event E\ we have 

(X e (r ),X e+h (r )) - j (a; + Cfe;X) if AfeM + ft) < AfeM ). 
Recall the definition of VPg from (2.7). Let 

i?f (z, /, t, k) = [ (Vg +h (x + (k,f,t-s)- 9 e (x, f,t-s)- Ac f{x)) e- r ^ e ' h >ds 

+ Ck, /, s) - Vg(x, f, s) - A c J(x)) e-^-W-W 



Using (4.41) one can see that 



lim ^E (l E HE (Df(X e +h,T A r h ,T) - D f (Xg, T A r h , T) |-F r , AT ) \x {p$) = x) 



lim E 

h-*0 



^Efe/,T-^AT 1& ) 



The above relation will also hold when (dX k (x, 9)/d9) < 0. We can now conclude that 

dx k (x e (4),e) ~ h . v , h . 



lim eM) = Um ] 

/i-»0 ft /i->-0 
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-R$(Xg(a?),f,T-a?AT,k) 



(4.41) 



(4.42) 



Suppose that for some x € Nq we have (dXk{x,6)/d6) > 0. This implies that Xk(x,9 + h) > X k {x,9) for h 
close to 0. Since 7^ is exponentially distributed with rate r(x,9,h) we obtain 

^E (l^E (D/(I w ,TA T h ,T)-D f (X e ,TA r h ,T)\j^ T n AT ) |Xe(crf) = a?) 
= ±E (p^(x)r(x, 9, h)R h e {x, f,T- af A T, fc) 



As /i — > 0, the processes Xg and Xg+zt converge almost surely to Xg. This implies that as h — > 0, erf — > cfj 
and r(Xg(a^),9,h) — > Xo(Xg(<Ji),8) almost surely, where Ui is the i-th. jump time of the process Xg. Also 
recall that r h — > 00 as /i — > 0. Therefore using the continuous mapping theorem, one can see that 



h-yO h 



d\ k (X B (<Ti),0) 
89 



Rg(Xe(ai),f,T-aiAT,k) 



(4.43) 
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where the function Rg is given by (2.8). Using (4.36), (4.39) and (4.43) we obtain 




i=0:<Ti<T 

This completes the proof of Theorem 2.3 . 



□ 



5 Conclusions and Future Work 

In this paper we present a new approach for obtaining unbiased estimates for the parameter sensitivities in 
a stochastic chemical reaction network. Our approach depends on a sensitivity formula that is derived using 
the random time change representation of Kurtz and the coupling introduced in [1]. The derived formula 
involves several quantities that cannot be directly computed from the trajectories of the reaction dynamics 
and hence they need to be estimated through simulations. We present a procedure called the Auxiliary Path 
Algorithm (APA) that efficiently estimates all the required quantities using a fixed number of auxiliary paths. 
It relies on the fact that a Markov process representing chemical kinetics is such that its different paths visit 
the same states again and again. With APA we can generate samples that can be used for estimating the 
parameter sensitivity. 

Using the birth-death model (see Example 3.1) and the gene expression network (see Example 3.2), we 
compare APA to the Girsanov method [19], which is the only other method known to produce unbiased 
estimates for the parameter sensitivity. Our results show that APA is considerably faster than the Girsanov 
method when the sensitivity is estimated with respect to a reaction rate constant which is small in size. 
Such a rate constant would correspond to a reaction which is slow in the reference time-scale of the system. 
Many biological networks have such slow reactions and this makes our method useful for sensitivity analysis. 
The main drawbacks of APA is that it is hard to implement and requires memory of a size proportional to 
the number of jumps in a typical path. 

Our method assumes that we can efficiently simulate the paths of the reaction network using Gillespie's 
Stochastic Simulation Algorithm [10]. However this is not true when the chemical system has reactions that 
are taking place at vastly different time-scales (see [3, 25, 5]). In such situations, our method will perform 
poorly but so will all the other methods for estimating the parameter sensitivity. Recently Kang and Kurtz 
[15] have developed a formal framework for reducing stochastic reaction models by exploiting the time-scale 
separation between the various reactions. These reduced models approximately capture the dynamics of the 
original model and they can be efficiently simulated using a variant of the standard Stochastic Simulation 
Algorithm (see [3, 25, 5]). It is natural to ask if one can just simulate the reduced model and approximately 
estimate the parameter sensitivity for the original model. In an upcoming paper we will show that this can 
indeed be done under some conditions. That result will follow from ideas that are similar to the ones used 
in the proof of Theorem 2.3. 

In some applications, one is interested in estimating the second order derivatives (the Hessian) with 
respect to the model parameters. One such application is optimization over the parameter space, where the 
Hessian is required to implement the Newton-Raphson scheme. Our approach in this paper can also yield 
exact expressions for the second order derivatives. One can then use such expressions to build unbiased 
estimators for the Hessian. We hope to do that in a future paper. 
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